3

Algorithms for Binary Neural

Networks

3.1

Overview

The most extreme quantization in the quantization area is binarization, which is the focus

of this book. Data can only have one of two potential values during binarization, which

is a 1-bit quantization:1 (or 0) or +1. Both weight and activation can be represented

by a single bit in network compression without consuming a lot of memory. In addition,

binarization replaces costly matrix multiplication operations with lighter bitwise XNOR

and Bitcount operations. Therefore, compared to alternative compression techniques, binary

neural networks (BNNs) have a variety of hardware-friendly advantages, such as significant

acceleration, memory savings, and power efficiency. The usefulness of binarization has been

demonstrated by ground-breaking work like BNN [99] and XNOR-Net [199], with XNOR-

Net being able to speed up CPUs by 58% and save up to 32 bytes of RAM for a 1-bit

convolution layer. Following the BNN paradigm, a lot of research has been done on this

topic in recent years from the field of computer vision and machine learning [84, 201, 153],

and it has been used for a variety of everyday tasks including image classification [48,

199, 159, 196, 267, 259], detection [263, 240, 264, 260], point cloud processing [194, 261],

object reidentification [262], etc. By transforming a layer from full precision to 1-bit, the

binarization approach intuitively makes it simple to verify the significance of a layer. If

performance suffers noticeably after binarizing a particular layer, we can infer that this layer

is on the network’s sensitive path. From the perspective of explainable machine learning, it

is also essential to determine if full-precision and binarized models operate similarly.

Numerous researchers have sought to shed light on the behaviors of model binarization,

as well as the relationships between the robustness of the model and the architecture of

deep neural networks, in addition to concentrating on the methods of model binarization.

This may aid in approaching solutions to fundamental queries of what network topology

is preferable and how the deep network functions. It is crucial to thoroughly explore BNN

studies because they will help us better understand the behaviors and architectures of

effective and reliable deep learning models. Some outstanding prior art reveals how BNN’s

components work. For example, Bi-Real Net [159] incorporates more shortcuts (Bi-Real)

to mitigate the information loss caused by binarization. This structure functions similarly

to the ResNet shortcut [84], which helps to explain why commonly used shortcuts can

somewhat improve the performance of deep neural networks. One thing that can be observed

by looking at the activations is that more specific information from the shallow layer can be

transmitted to the deeper layer during forward propagation. On the other hand, to avoid

the gradient vanishing problem, gradients can be directly propagated backward using the

shortcut. By building numerous weak classifier groups, some ensemble approaches [301]

improve BNN performance but occasionally run into overfitting issues. Based on analysis

and testing with BNNs, they demonstrated that the number of neurons trumps bit width

DOI: 10.1201/9781003376132-3

37